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Abstract 



I Performing random walks in networks is a fundamental primitive that has found applications 

■ in many areas of computer science, including distributed computing. In this paper, we focus on 
I the problem of sampling random walks efficiently in a distributed network and its applications. 

Given bandwidth constraints, the goal is to minimize the number of rounds required to obtain 
random walk samples. 

0\ '. All previous algorithms that compute a random walk sample of length £ as a subroutine 

always do so naively, i.e., in 0{£) rounds. The main contribution of this paper is a fast distributed 
algorithm for performing random walks. We present a sublinear time distributed algorithm for 
I performing random walks whose time complexity is sublinear in the length of the walk. Our 

algorithm performs a random walk of length £ in 0{V£D) rounds {O hides polylog n factors where 
n is the number of nodes in the network) with high probability on an undirected network, where 
^ ■ _D is the diameter of the network. For small diameter graphs, this is a significant improvement 

over the naive 0{£) bound. Furthermore, our algorithm is optimal within a poly-logarithmic 
factor as there exists a matching lower bound [50] . We further extend our algorithms to efficiently 
^ I perform k independent random walks in 0{-\/k£D + k) rounds. We also show that our algorithm 

■ can be applied to speedup the more general Metropolis-Hastings sampling. 

I Our random walk algorithms can be used to speed up distributed algorithms in applications 

■ that use random walks as a subroutine. We present two main applications. First, we give a fast 
distributed algorithm for computing a random spanning tree (RST) in an arbitrary (undirected 



unweighted) network which runs in 0{y/mD) rounds with high probability (m is the number 
I of edges) . Our second application is a fast decentralized algorithm for estimating mixing time 

and related parameters of the underlying network. Our algorithm is fully decentralized and can 
serve as a building block in the design of topologically- aware networks. 
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1 Introduction 



Random walks play a central role in computer science, spanning a wide range of areas in both theory 
and practice. The focus of this paper is on random walks in networks, in particular, decentralized 
algorithms for performing random walks in arbitrary networks. Random walks are used as an inte- 
gral subroutine in a wide variety of network applications ranging from token management [32, 8, 14], 
load balancing [34], small-world routing [40], search [56, 1, 12, 29, 43], information propagation and 
gathering [9, 37], network topology construction [29, 41, 42], checking expansion [22], constructing 
random spanning trees [10, 6, 5], monitoring overlays [48], group communication in ad-hoc net- 
work [21], gathering and dissemination of information over a network [3], distributed construction 
of expander networks [41], and peer-to-peer membership management [26, 57]. Random walks are 
also very useful in providing uniform and efficient solutions to distributed control of dynamic net- 
works [11, 56]. Random walks are local and lightweight; moreover, they require little index or state 
maintenance which makes them especially attractive to self-organizing dynamic networks such as 
Internet overlay and ad hoc wireless networks. 

A key purpose of random walks in many of these network applications is to perform node 
sampling. While the sampling requirements in different applications vary, whenever a true sample 
is required from a random walk of certain steps, typically all applications perform the walk naively 
— by simply passing a token from one node to its neighbor: thus to perform a random walk of 
length ^ takes time linear in 

In this paper, we present an optimal (within a poly-logarithmic factor) sublinear time (sublin- 
ear in tj distributed random walk sampling algorithm that is significantly faster than the naive 
algorithm when £ ^ D. Our algorithm runs in time 0{V £D) rounds. This running time is opti- 
mal (within a poly-logarithmic factor) since a matching lower bound was shown recently in [50]. 
We then present two key applications of our algorithm. The first is a fast distributed algorithm 
for computing a random spanning tree, a fundamental problem that has been studied widely in 
the classical setting (see e.g., [35] and references therein) and in some special cases in distributed 
settings [6]. To the best of our knowledge, our algorithm gives the fastest known running time in 
an arbitrary network. The second is to devising efficient decentralized algorithms for computing 
key global metrics of the underlying network — mixing time, spectral gap, and conductance. Such 
algorithms can be useful building blocks in the design of topologically (self-)aware networks, i.e., 
networks that can monitor and regulate themselves in a decentralized fashion. For example, effi- 
ciently computing the mixing time or the spectral gap, allows the network to monitor connectivity 
and expansion properties of the network. 

1.1 Distributed Computing 

Consider an undirected, unweighted, connected n-node graph G = {V,E). The network is modeled 
by an undirected n-vertex graph, where vertices model the processors and edges model the links 
between the processors. Suppose that every node (vertex) hosts a processor with unbounded com- 
putational power, but with limited initial knowledge. The processors communicate by exchanging 
messages via the links (henceforth, edges). The vertices have limited global knowledge, in particu- 
lar, each of them has its own local perspective of the network, which is confined to its immediate 
neighborhood. Specifically, assume that each node is associated with a distinct identity number 
from the set {1, 2, . . . , poly(n)}. At the beginning of the computation, each node v accepts as 
input its own identity number and the identity numbers of its neighbors in G. The node may 
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also accept some additional inputs as specified by the problem at hand. The nodes are allowed to 
communicate through the edges of the graph G. The communication is synchronous, and occurs in 
discrete pulses, called rounds. In particular, all the nodes wake up simultaneously at the beginning 
of round 1. For convenience, our algorithms assume that nodes always know the number of the 
current round (although this is not really needed — cf. Section 2). 

We assume the CONQ8ST communication model, a widely used standard model to study 
distributed algorithms [52]: a node v can send an arbitrary message of size at most O(logn) 
through an edge per time step. (We note that if unbounded-size messages were allowed through 
every edge in each time step, then the problems addressed here can be trivially solved in 0{D) 
time by collecting all information at one node, solving the problem locally, and then broadcasting 
the results back to all the nodes [52].) The design of efficient algorithms for the CONQ£ST model 
has been the subject of an active area of research called (locality-sensitive) distributed computing 
(see [52] and references therein.) It is straightforward to generalize our results to a CONQ£Sl'{B) 
model, where 0{B) bits can be transmitted in a single time step across an edge. 

There are several measures of efficiency of distributed algorithms, but we will concentrate on one 
of them, specifically, the running time, that is, the number of rounds of distributed communication. 
(Note that the computation that is performed by the nodes locally is "free", i.e., it does not affect 
the number of rounds.) Many fundamental network problems such as minimum spanning tree, 
shortest paths, etc. have been addressed in this model (e.g., see [44, 52, 51]). In particular, there 
has been much research into designing very fast distributed approximation algorithms (that are 
even faster at the cost of producing sub-optimal solutions) for many of these problems (see e.g., 
[24, 23, 39, 38]). Such algorithms can be useful for large-scale resource-constrained and dynamic 
networks where running time is crucial. 

1.2 Problems 

We consider the following basic random walk problem. 

Computing One Random Walk where Destination Outputs Source We are given an 
arbitrary undirected, unweighted, and connected n-node network G = {V, E) and a source node 
s . The goal is to devise a distributed algorithm such that, in the end, some node v outputs the 
ID of s, where v is a. destination node picked according to the probability that it is the destination 
of a random walk of length I starting at s. For brevity, this problem will henceforth be simply 
called Single Random Walk. 

For clarity, observe that the following naive algorithm solves the above problem in 0{tj rounds: 
The walk of length I is performed by sending a token for ^ steps, picking a random neighbor in 
each step. Then, the destination node v of this walk outputs the ID of s. Our goal is to perform 
such sampling with significantly less number of rounds, i.e., in time that is sublinear in ^. On 
the other hand, we note that it can take too much time (as much as GdE"! -|- D) time) in the 
CONQEST model to collect all the topological information at some node (and then computing the 
walk locally). 

We also consider the following variations and generalizations of the Single Random Walk prob- 
lem. 

1. k Random Walks, Destinations output Sources (k-RW-DoS): We have k sources si, S2, s^ 
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(not necessarily distinct) and we want each of k destinations to output the ID of its corre- 
sponding source. 

2. k Random Walks, Sources output Destinations (k-RW-SoD): Same as above but we want each 
source to output the ID of its corresponding destination. 

3. k Random Walks, Nodes know their Positions (k-RW-pos): Instead of outputting the ID of 
source or destination, we want each node to know its position(s) in the random walk. That 
is, for each Sj, if vi,V2, ■■■,Vi (where vi = Sj) is the resultant random walk starting at Sj, we 
want each node vj in the walk to know the number j at the end of the process. 

Throughout this paper, we assume the standard (simple) random walk: in each step, an edge 
is taken from the current node v with probability 1/ deg{v) where deg{v) is the degree of v. Our 
goal is to output a true random sample from the £-walk distribution starting from s. 

1.3 Motivation 

There are two key motivations for obtaining sublinear time bounds. The first is that in many al- 
gorithmic applications, walks of length significantly greater than the network diameter are needed. 
For example, this is necessary in both the applications presented later in the paper, namely dis- 
tributed computation of a random spanning tree (RST) and computation of mixing time. In the 
RST algorithm, we need to perform a random walk of expected length 0{mD) (where m is the 
number of edges in the network). In decentralized computation of mixing time, we need to perform 
walks of length at least the mixing time which can be significantly larger than the diameter (e.g., in 
a random geometric graph model [49], a popular model for ad hoc networks, the mixing time can be 
larger than the diameter by a factor of Q{^/n).) More generally, many real-world communication 
networks (e.g., ad hoc networks and peer-to-peer networks) have relatively small diameter, and ran- 
dom walks of length at least the diameter are usually performed for many sampling applications, 
i.e., I ^ D. It should be noted that if the network is rapidly mixing/expanding which is sometimes 
the case in practice, then sampling from walks of length i ^ D is close to sampling from the steady 
state (degree) distribution; this can be done in 0{D) rounds (note however, that this gives only 
an approximately close sample, not the exact sample for that length). However, such an approach 
fails when i is smaller than the mixing time. 

The second motivation is understanding the time complexity of distributed random walks. 
Random walk is essentially a "global" problem which requires the algorithm to "traverse" the 
entire network. Classical global problems include the minimum spanning tree, shortest path etc. 
Network diameter is an inherent lower bound for such problems. Problems of this type raise the 
basic question whether n (or i as is the case here) time is essential or is the network diameter D, 
the inherent parameter. As pointed out in the work of [27], in the latter case, it would be desirable 
to design algorithms that have a better complexity for graphs with low diameter. 

Notation: Throughout the paper, we let i be the length of the walks, k be the number of walks, 
D be the network diameter, 6 be the minimum node degree, n be the number of nodes, and m be 
the number of edges in the network. 
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1.4 Our Results 



A Fast Distributed Random Walk Algorithm We present the first subhnear, time-optimal, 
distributed algorithm for the 1-RW-DoS problem in arbitrary networks that runs in time 0{-\/lD) 
with high probability^ , where I is the length of the walk (the precise theorem is stated in Section 
2). Our algorithm is randomized (Las Vegas type, i.e., it always outputs the correct result, but the 
running time claimed is with high probability). 

The high-level idea behind our algorithm is to "prepare" a few short walks in the beginning 
and carefully stitch these walks together later as necessary. If there are not enough short walks, we 
construct more of them on the fly. We overcome a key technical problem by showing how one can 
perform many short walks in parallel without causing too much congestion. 

Our algorithm exploits a certain key property of random walks. The key property is a bound on 
the number of times any node is visited in an ^-length walk, for any length i = 0{rn?). We prove 
that w.h.p. any node x is visited at most 0{deg{x)^/i) times, in an ^-length walk from any starting 
node (deg(x) is the degree of x). We then show that if only certain £/A special points of the walk 
(called connector points) are observed, then any node is observed only 0(deg(rc)\/^/A) times. The 
algorithm starts with all nodes performing short walks (of length uniformly random in the range A 
to 2A for appropriately chosen A) efficiently and simultaneously; here the randomly chosen lengths 
play a crucial role in arguing about a suitable spread of the connector points. Subsequently, the 
algorithm begins at the source and carefully stitches these walks together till ^ steps are completed. 

We note that the running time of our algorithm matches the unconditional lower bound recently 
shown in [50]. Thus the running time of our algorithm is (essentially) the best possible (up to 
polylogarithmic factors) . 

We also extend the result to give algorithms for computing k random walks (from any k sources 

— not necessarily distinct) in O (^m.{\fkID + k,k + t)^ rounds. We note that the k random walks 
generated by our algorithm are independent (cf. Section 4.1). Computing k random walks is useful 
in many applications such as the one we present below on decentralized computation of mixing 
time and related parameters. While the main requirement of our algorithms is to just obtain the 
random walk samples (i.e. the end point of the £ step walk), our algorithms can regenerate the 
entire walks such that each node knows its position(s) among the i steps (the /c-RW-pos problem). 
Our algorithm can be extended to do this in the same number of rounds. 

We finally present extensions of our algorithm to perform random walk according to the 
Metropolis-Hastings [31, 46] algorithm, a more general type of random walk with numerous appli- 
cations (e.g., [56]). The Metropolis-Hastings algorithm gives a way to define transition probabilities 
so that a random walk converges to any desired distribution. An important special case is when 
the distribution is uniform. 

Remarks While the message complexity is not the main focus of this paper, we note that our 
improved running time comes with the cost of an increased message complexity from the naive 
algorithm (we discuss this in Section 6). Our message complexity for computing a random walk 
of length I is 0{m\/iD + ny^i/D) which can be worse than the naive algorithm's 0{£) message 
complexity. 

^Throughout this paper, "with high probabihty (whp)" means with probabihty at least 1 — l/n^'^', where n is 
the number of nodes in the network. 
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Applications Our faster distributed random walk algorithm can be used in speeding up dis- 
tributed applications where random walks arise as a subroutine. Such applications include dis- 
tributed construction of expander graphs, checking whether a graph is an expander, construction 
of random spanning trees, and random- walk based search (we refer to [19] for details). Here, we 
present two key applications: 

(1) A Fast Distributed Algorithm for Random Spanning Trees (RST): We give an 0{y/rnD) 
time distributed algorithm (cf. Section 5.1) for uniformly sampling a random spanning tree in an 
arbitrary undirected (unweighted) graph (i.e., each spanning tree in the underlying network has 
the same probability of being selected). Spanning trees are fundamental network primitives and 
distributed algorithms for various types of spanning trees such as minimum spanning tree (MST), 
breadth-first spanning tree (BFS), shortest path tree, shallow-light trees etc., have been studied 
extensively in the literature [52]. However, not much is known about the distributed complexity 
of the random spanning tree problem. The centralized case has been studied for many decades, 
see e.g., the recent work of [35] and the references therein; also see the recent work of Goyal et 
al. [30] which gives nice applications of RST to fault-tolerant routing and constructing expanders. 
In the distributed computing context, the work of Bar-Ilan and Zernik [6] give distributed RST 
algorithms for two special cases, namely that of a complete graph (running in constant time) and 
a synchronous ring (running in 0{n) time). The work of [5] gives a self-stablizing distributed 
algorithm for constructing an RST in a wireless ad hoc network and mentions that RST is more 
resilient to transient failures that occur in mobile ad hoc networks. 

Our algorithm works by giving an efficient distributed implementation of the well-known Aldous- 
Broder random walk algorithm [2, 10] for constructing an RST. 

(2) Decentralized Computation of Mixing Time. We present a fast decentralized algorithm 
for estimating mixing time, conductance and spectral gap of the network (cf. Section 5.2). In 
particular, we show that given a starting point x, the mixing time with respect to x, called t^^^;? 
can be estimated in 0(n^/^ -|- ji^/^y^Dr^^^) rounds. This gives an alternative algorithm to the 
only previously known approach by Kempe and McSherry [36] that can be used to estimate r^^^ 
in O(r^j^) rounds.^ To compare, we note that when r^j^, = uj{n^/'^) the present algorithm is faster 
(assuming D is not too large). 

1.5 Related Work 

Random walks have been used in a wide variety of applications in distributed networks as mentioned 
in the beginning of Section 1. We describe here some of the applications in more detail. Our focus 
is to emphasize the papers of a more theoretical nature, and those that use random walks as one 
of the central subroutines. 

Speeding up distributed algorithms using random walks has been considered for a long time. 
Besides our approach of speeding up the random walk itself, one popular approach is to reduce the 
cover time. Recently, Alon et. al. [4] show that performing several random walks in parallel reduces 
the cover time in various types of graphs. They assert that the problem with performing random 
walks is often the latency. In these scenarios where many walks are performed, our results could 
help avoid too much latency and yield an additional speed-up factor. Other recent works involving 
multiple random walks in different settings include Elsasser et. al. [25], and Cooper et al. [13]. 

^Note that [36] in fact does more and gives a decentralized algorithm for computing the top k eigenvectors of a 
weighted adjacency matrix that runs in 0{Tm,ix^o^ n) rounds if two adjacent nodes are allowed to exchange O(fc^) 
messages per round, where Tmix is the mixing time and n is the size of the network. 
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A nice application of random walks is in the design and analysis of expanders. We mention 
two results here. Law and Siu [41] consider the problem of constructing expander graphs in a 
distributed fashion. One of the key subroutines in their algorithm is to perform several random 
walks from specified source nodes. While the overall running time of their algorithm depends 
on other factors, the specific step of computing random walk samples can be improved using our 
techniques presented in this paper. Dolev and Tzachar [22] use random walks to check if a given 
graph is an expander. The first algorithm given in [22] is essentially to run a random walk of length 
nlogn and mark every visited vertices. Later, it is checked if every node is visited. 

Broder [10] and Wilson [55] gave algorithms to generate random spanning trees using random 
walks and Broder's algorithm was later applied to the network setting by Bar-Ilan and Zernik [6]. 
Recently Goyal et al. [30] show how to construct an expander /sparsifier using random spanning 
trees. If their algorithm is implemented on a distributed network, the techniques presented in this 
paper would yield an additional speed-up in the random walk constructions. 

Morales and Gupta [48] discuss about discovering a consistent and available monitoring overlay 
for a distributed system. For each node, one needs to select and discover a list of nodes that 
would monitor it. The monitoring set of nodes need to satisfy some structural properties such 
as consistency, verifiability, load balancing, and randomness, among others. This is where random 
walks come in. Random walks is a natural way to discover a set of random nodes that are spread out 
(and hence scalable), that can in turn be used to monitor their local neighborhoods. Random walks 
have been used for this purpose in another paper by Ganesh et al. [26] on peer-to-peer membership 
management for gossip-based protocols. 

The general high-level idea of using a few short walks in the beginning (executed in parallel) 
and then carefully stitch these walks together later as necessary was introduced in [15] to find 
random walks in data streams with the main motivation of computing PageRank. However, the 
two models have very different constraints and motivations and hence the subsequent techniques 
used here and in [15] are very different. Recently, Sami and Twigg [53] consider lower bounds 
on the communication complexity of computing the stationary distribution of random walks in a 
network. Although their problem is related to our problem, the lower bounds obtained do not 
imply anything in our setting. 

The work of [28] discusses spectral algorithms for enhancing the topology awareness, e.g., by 
identifying and assigning weights to critical links. However, the algorithms are centralized, and 
it is mentioned that obtaining efficient decentralized algorithms is a major open problem. Our 
algorithms are fully decentralized and based on performing random walks, and so are more amenable 
to dynamic and self-organizing networks. 

Subsequent Work Since the publication of the conference versions of our papers [19, 20], 
additional results have been shown, extending our algorithms to various settings. 

The work of [50] showed a tight lower bound on the running time of distributed random walk 
algorithms using techniques from communication complexity [16]. Specifically, it is shown in [50] 
that for any n, D, and D < £ < {n/{D^ logn))^/^, performing a random walk of length @{£) on an 
n-node network of diameter D requires £D + D) time. This shows that the running time of 
our 1-RW-DoS algorithm is (essentially) the best possible (up to polylogarithmic factors). 

In [18], it is shown how to improve the message complexity of the distributed random walk 
algorithms presented in this paper. The main reason for the increased message complexity of our 
algorithms is that to compute one long walk many short walks are generated — most of which go 
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unused. One idea is to use these unused short walks to compute other (independent) long walks. 
This idea is explored in [18] where it is shown that under certain conditions (e.g., when the starting 
point of the random walk is chosen proportional to the node degree) , the overall message complexity 
of computing many long walks can be made near-optimal. 

The fast distributed random walk algorithms presented in this paper applies only for static 
networks and does not apply to a dynamic network. The recent work of [17] investigates efficient 
distributed computation in dynamic networks in which the network topology changes (arbitrarily) 
from round to round. The paper presents a rigorous framework for design and analysis of distributed 
random walk sampling algorithms in dynamic networks. Building on the techniques developed in the 
present paper, the main contribution of [17] is a fast distributed random walk sampling algorithm 
that runs in 0(\/t<I>) rounds (with high probability) (r is the dynamic mixing time and <1> is the 
dynamic diameter of the network) and returns a sample close to a suitably defined stationary 
distribution of the dynamic network. This is then shown to be useful in designing a fast distributed 
algorithm for information spreading in a dynamic network. 

2 Algorithm for 1-RW-DoS 

In this section we describe the algorithm to sample one random walk destination. We show that 
this algorithm takes 0{y/iD) rounds with high probability and extend it to other cases in the next 
sections. First, we make the following simple observation, which will be assumed throughout. 

Observation 2.1. We may assume that £ is 0{m?'), where m is the number of edges in the network. 

The reason is that if I is O(m^), the required bound of 0(\/ ID) rounds is easily achieved by 
aggregating the graph topology (via upcast) onto one node in 0{m + D) rounds (e.g., see [52]). 
The difficulty lies in proving the case oi I = 0{m'^). 

A Slower algorithm Let us first consider a slower version of the algorithm to highlight the 
fundamental idea used to achieve the sub-linear time bound. We will show that the slower algorithm 
runs m time 0(£2/3i:)i/3), The high-level idea (see Fi gure 1) is to perform "many" short random 
walks in parallel and later stitch them together as needed. In particular, we perform the algorithm 
in two phases, as follows. 

In Phase 1, we perform rj "short" random walks of length A from each node v, where r] and 
A are some parameters whose values will be fixed in the analysis. (We note that we will need 
slightly more short walks when we develop a faster algorithm.) This is done naively by forwarding 
rj "coupons" having the ID of f , from v to random destinations^, as follows. 

1: Initially, each node v creates r] messages (called coupons) Ci,C2,...,C^ and writes its ID on 

them. 
2: for z = 1 to A do 

3: This is the i-th iteration. Each node v does the following: Consider each coupon C held by 
V which is received in the {i — l)-th iteration. (The zeroth iteration is the initial stage where 
each node creates its own messages.) Now v picks a neighbor u uniformly at random and 
forwards C to u after incrementing the counter on the coupon to i. 

■^The term "coupon" refers to the same meaning as the more commonly used term of "token" but we use the term 
coupon here and reserve the term token for the second phase. 
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4: end for 



At the end of the process, for each node v, there will be rj coupons containing u's ID distributed 
to some nodes in the network. These nodes are the destinations of short walks of length A starting 
at V. We note that the notion of "phase" is used only for simplicity. The algorithm does not really 
need round numbers. If there are many messages to be sent through the same edge, send one with 
minimum counter first. 

For Phase 2, for sake of exposition, let us first consider an easier version of the algorithm (that is 
incomplete) which avoids some details. Starting at source s, we "stitch" some of the A-length walks 
prepared in Phase 1 together to form a longer walk. The algorithm starts from s and randomly 
picks one coupon distributed from s in Phase 1. This can be accomplished by having every node 
holding coupons of s write their IDs on the coupon and sending the coupons back to s. Then s 
picks one of these coupons randomly and returns the rest to the owners. (However, aggregating all 
coupons at s is inefficient. The better way to do this is to use the idea of reservoir sampling [54]. 
We will develop an algorithm called Sample-Coupon to do this job efficiently later on.) 

Let C be the sampled coupon and v be the destination node of C. The source s then sends 
a "token" to v and v deletes coupon C (so that C will not be sampled again next time). The 
process then repeats. That is, the node v currently holding the token samples one of the coupons 
it distributed in Phase 1 and forwards the token to the destination of the sampled coupon, say v' . 
(Nodes V, v' are called "connectors" — they are the endpoints of the short walks that are stitched.) 
A crucial observation is that the walk of length A used to distribute the corresponding coupons 
from s io V and from v to v' are independent random walks. Therefore, we can stitch them to get 
a random walk of length 2A. (This fact will be formally proved in the next section.) We therefore 
can generate a random walk of length 3A, 4A, ... by repeating this process. We do this until we have 
completed more than I — \ steps. Then, we complete the rest of the walk by running the naive 
random walk algorithm. The algorithm for Phase 2 is thus the following. 

1: The source node s creates a message called "token" which contains the ID of s 
2: while Length of the walk completed is at most £ — A do 
3; Let V be the node that is currently holding the token. 

4: V calls Sample-Coupon (u) to sample one of the coupons distributed by v (in Phase 1) 

uniformly at random. Let C be the sampled coupon. 
5: Let v' be the node holding coupon C. (ID of v' is written on C .) 
6; V sends the token to v' and v' deletes C so that C will not be sampled again. 
7: The length of the walk completed has now increased by A. 
8: end while 

9: Walk naively (i.e., forward the token to a random neighbor) until I steps are completed. 
10: A node holding the token outputs the ID of s. 

Figure 1 illustrates the idea of this algorithm. To understand the intuition behind this (incom- 
plete) algorithm, let us analyze its running time. First, we claim that Phase 1 needs 0{r]X) rounds 
with high probability. This is because if we send out deg(f ) coupons from each node v at the same 
time, each edge should receive two coupons in the average case. In other words, there is essentially 
no congestion (i.e., not too many coupons are sent through the same edge). Therefore sending out 
(just) one coupon from each node for A steps will take 0(A) rounds in expectation and the time 
becomes 0{r}X) for rj coupons. This argument can be modified to show that we need 0{rj\) rounds 
with high probability. (The full proof will be provided in Lemma 3.2 in the next section.) We will 
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Figure 1: Figure illustrating the algorithm of stitching short walks together. 
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also show that Sample- COUPON can be done in 0{D) rounds and it follows that Phase 2 needs 
0{D ■ i/X) rounds. Therefore, the algorithm needs 0{rj\ + D ■ i/X) which is 0{^/TD) when we set 
r/ = 1 and A = yJW. 

The reason the above algorithm for Phase 2 is incomplete is that it is possible that r] coupons 
are not enough: We might forward the token to some node v many times in Phase 2 and all coupons 
distributed by v in the first phase may get deleted. In other words, v is chosen as a connector node 
many times, and all its coupons have been exhausted. If this happens then the stitching process 
cannot progress. To cope with this problem, we develop an algorithm called Send-More-Coupons 
to distribute more coupons. In particular, when there is no coupon of v left in the network and v 
wants to sample a coupon, it calls Send-More-Coupons to send out rj new coupons to random 
nodes. (Send-More-Coupons gives the same result as Phase 1 but the algorithm will be different 
in order to get a good running time.) In particular, we insert the following lines between Line 4 
and 5 of the previous algorithm. 

1: if C = NULL (all coupons from v have already been deleted) then 

2: V calls Send-More-Coupons(u, ?7, A) (Distribute r] new coupons. These coupons are 

forwarded for A rounds.) 
3: V calls Sample-Coupon(?;) and let C be the returned coupon. 
4: end if 

To complete this algorithm we now describe Sample-Coupon and Send-More-Coupons. 
The main idea of algorithm Sample- COUPON is to sample the coupons through a BFS (breadth- 
first search) tree from the leaves upward to the root. We allow each node to send only one coupon to 
its parent to avoid congestion. That is, in each round some node u will receive some coupons from 
its children (at most one from each child). Let these children be tii, U2, Ug. Then, u picks one of 
these coupons and sends to its parent. To ensure that u picks a coupon with uniform distribution, 
it picks the coupon received from Ui with probability proportional to the number of coupons in the 
subtree rooted at Ui. The precise statement of this algorithm can be found in Algorithm 1. The 
correctness of this algorithm (i.e., it outputs a coupon from uniform probability) will be proved in 
the next section (cf. Claim 3.8). 

The Send-More-Coupons algorithm does essentially the same as what we did in Phase 1 
with only one exception: Since this time we send out coupons from only one node, we can avoid 
congestions by combining coupons delivered on the same edge in each round. This algorithm is 
described in Algorithm 2, Part 1. (We will describe Part 2 later after we explain how to speed up 
the algorithm). 

The analysis in the next section shows that Send-More-Coupons is called at most i/{r]X) 
times in the worst case and it follows that the algorithm above takes time 0(£^/^L>^/^). 

Faster algorithm We are now ready to introduce the second idea which will complete the al- 
gorithm. (The complete algorithm is described in Algorithm 3.) To speed up the above slower 
algorithm, we pick the length of each short walk uniformly at random in range [A, 2A — 1], instead 
of fixing it to A. The reason behind this is that we want every node in the walk to have some 
probability to take part in token forwarding in Phase 2. 

For example, consider running our random walk algorithm on a star network starting at the 
center and let A = 2. If all short walks have length two then the center will always forward the 
token to itself in Phase 2. In other words, the center is the only connector and thus will appear as 
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ALGORITHM 1: Sample-Coupon(w) 
Input: Starting node v. 

Output: A node sampled from among the nodes holding the coupon of v 
1: Construct a Breadth-First-Searcli (BFS) tree rooted at v. While constructing, every node stores its 

parent's ID. Denote such a tree by T. 
2: We divide T naturally into levels through D (where nodes in level D are leaf nodes and the root node 

V is in level 0). 

3: Every node u that holds some coupons of v picks one coupon uniformly at random. Let Co denote such 

a coupon and let xq denote the number of coupons u has. Node u writes its ID on coupon Co- 
4: for i = D down to do 

5; Every node u in level i that either receives coupon(s) from children or possesses coupon(s) itself do 
the following. 

6: Let u have q coupons (including its own coupons). Denote these coupons by Cq, Ci, C2, . . . , Cq_i 
and let their counts be xq, xi, X2, ■ ■ ■ , Xq^i. Node u samples one of Cq through Cg_i, with 
probabilities proportional to the respective counts. That is, for any < J < 9 ^ 1, Cj is sampled 
with probability — ; . 

7: The sampled coupon is sent to the parent node (unless already at root) along with a count of 

xq -\- xi + . . . + Xq^i (the count represents the number of coupons from which this coupon has been 
sampled) . 

8: end for 

9; The root outputs the ID of the owner of the final sampled coupon (written on such a coupon). 



a connector 1/2 times. This is undesirable since we have to prepare many walks from the center. 
In contrast, if we randomize the length of each short walk between two and three then the number 
of times that the center is a connector is H/A in expectation. (To see this, observe that, regardless 
of where the token started, the token will be forwarded to the center with probability 1/2.) 

In the next section, we will show an important property which says that a random walk of length 
^ = 0{'rin?) will visit each node v at most 0{^/^(^.eg{v)) times. We then use the above modification 
to claim that each node will be visited as a connector only 0(\/Zdeg(u)/A) times. This implies 
that each node does not have to prepare too many short walks which leads to the improved running 
time. 

To do this modification, we need to modify Phase 1 and Send-More-Coupons. For Phase 1, 
we simply change the length of each short walk to A + r where r is a random integer in [0, A — 1]. 
This modification is shown in Algorithm 3. A very slight change is also made on Phase 2. For a 
technical reason, we also prepare r]Aeg{v) coupons from each node in Phase 1, instead of previously 
?7 coupons. Our analysis in the next section shows that this modification still needs 0{r]\) rounds 
as before. 

To modify Send-More-Coupons, we add Part 2 to the algorithm (as in Algorithm 2) where 
we keep forwarding each coupon with some probability. It can be shown by a simple calculation 
that the number of steps each coupon is forwarded is uniformly between A and 2A — 1. 

We now have the complete description of the algorithm (Algorithm 3) and are ready to show 
the analysis. 
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ALGORITHM 2: SEND-MORE-CouPONS(t;, rj, A) 

Part 1 Distribute r] new coupons for A steps. 
1: The node v constructs rj (identical) messages containing its ID. We refer to these messages new coupons. 
2: for J = 1 to A do 
3: Each node u docs the following: 

4: - For each new coupon C held by u, node u picks a neighbor z uniformly at random as a receiver of 
C. 

5: - For each neighbor z of u, node u sends the ID of v and the number of new coupons for which z is 

picked as a receiver, denoted by c{u,v). 
6: - Each neighbor z of u, upon receiving ID of v and c(u, v), constructs c(u, v) new coupons, each 

containing the ID of w. 
7: end for 

Part 2 Each coupon has now been forwarded for A steps. These coupons are now extended 
probabilistically further by r steps where each r is independent and uniform in the range [0, A — 1]. 
1: for i = to A — 1 do 

2: For each coupon, independently with probability j^, stop sending the coupon further and save the 
ID of the source node (in this event, the node with the message is the destination). For each coupon 
that is not stopped, each node picks a neighbor correspondingly and sends the coupon forward as 
before. 

3: end for 

4; At the end, each destination node knows the source ID as well as the number of times the 
corresponding coupon has been forwarded. 



3 Analysis of Single- RANDOM- Walk 

We divide the analysis into four parts. First, we show the correctness of Algorithm Single- 
Random- Walk. (The proofs of the following lemmas will be shown in subsequent sections.) 

Lemma 3.1. Algorithm Single-Random-Walk solves 1-RW-DoS. That is, for any node v, after 
algorithm Single-Random-Walk finishes, the probability that v outputs the ID of s is equal to 
the probability that it is the destination of a random walk of length £ starting at s. 

Once we have established the correctness, we focus on the running time. In the second part, 
we show the probabilistic bound of Phase 1 . 

Lemma 3.2. Phase 1 finishes in 0{\ri) rounds with high probability. 

In the third part, we analyze the worst case bound of Phase 2, which is a building block of the 
probabilistic bound of Phase 2. 

Lemma 3.3. Phase 2 finishes in + |) rounds. 

We note that the above bound holds even when we fix the length of the short walks (instead of 
randomly picking from [A,2A]). Moreover, using the above lemmas we can conclude the (weaker) 
running time of 0{i'^^^D^^^) by setting r] and A appropriately, as follows. 

Corollary 3.4. For any £, Algorithm Single-Random-Walk (cf. Algorithm 3) solves 1-RW-DoS 
correctly and, with high probability, finishes in 0{i^^^D^^^) rounds. 
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ALGORITHM 3: Single- Random- Walk(s, I) 
Input: Starting node s, desired walk length i and parameters A and r/. 
Output: A destination node of the random walk of length I output the ID of s. 
Phase 1: Generate short walks by coupon distribution. Each node v performs rydeg(v) 
random walks of length A + where (for each 1 < i < 77deg(u)) is chosen independently and 
uniformly at random in the range [0, A — 1]. (We note that random numbers generated by different 
nodes are different.) At the end of the process, there are 77deg(z;) (not necessarily distinct) nodes 
holding a "coupon" containing the ID of w. 
1: for each node v do 

2: Generate 77deg(i;) random integers in the range [0, A — 1], denoted by ri,r2, dcg(t))- 
3: Construct rj deg[v) messages containing its ID and in addition, the z-th message contains the desired 
walk length of A -I- . We will refer to these messages created by node v as "coupons created by v'' . 
4: end for 
5: for i = 1 to 2A do 

6: This is the i-th iteration. Each node v does the following: Consider each coupon C held by v which is 
received in the (i — l)-th iteration. (The zcroth iteration is the initial stage where each node creates 
its own messages.) If the coupon C's desired walk length is at most i, then v keeps this coupon [v is 
the desired destination). Else, v picks a neighbor u uniformly at random and forwards C to u. 

7: end for 

Phase 2: Stitch short walks by token forwarding. Stitch <d{£/\) walks, each of length in 
[A,2A-1]. 

1: The source node s creates a message called "token" which contains the ID of s 

2: The algorithm will forward the token around and keep track of a set of connectors^ denoted by C. 

Initially, C = {s}. 
3: while Length of the walk completed is at most £ — 2A do 
4: Let V be the node that is currently holding the token. 

5: V calls Sajviple-Coupon(w) to imiformly sample one of the coupons distributed by v. Let C be the 
sampled coupon. 

6: if v' ~ NULL (all coupons from v have already been deleted) then 

7: V calls Send-More-Coupons('u, ry, A) (Perform 6(77) walks of length A + starting at v, where 

Ti is chosen uniformly at random in the range [0, A — 1] for the i-th walk.) 
8: V calls Sample-Coupon(w) and let C be the returned value 
9: end if 

10: Let v' be node holding coupon C . (ID of v' is written on C.) 

11: V sends the token to v\ and v' deletes C so that C will not be sampled again. 

12: C=C\J{v'} 

13: end while 

14: Walk naively until £ steps are completed (this is at most another 2A steps) 
15: A node holding the token outputs the ID of s 
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Proof. Set 7] = i^/^/D^/^ and A = i^^^D'^/^. Using Lemma 3.2 and 3.3, the algorithm finishes in 
0(Ar/ + ^ + |) = 0(^2/3^1/3^) ^^^Yi high probabihty. □ 

In the last part, we improve the running time of Phase 2 further, using a probabilistic bound, 
leading to a better running time overall. The key ingredient here is the Random Walk Visits Lemma 
(cf. Lemma 3.12) stated formally in Section 3.4 and proved in Section 3.5. Then we use the fact 
that the short walks have random length to obtain the running time bound. 

Lemma 3.5. For any r] and A such that rjX > 32\/Z(log n)^, Phase 2 finishes in O(^) rounds with 
high probability. 

Using the results above, we conclude the following theorem. 

Theorem 3.6. For any I, Algorithm Single-Random-Walk (cf. Algorithm 3) solves l-RW-DoS 
correctly and, with high probability, finishes in 0{\flD) rounds. 

Proof. Set = 1 and A = 32\/ZD(log n)^. Using Lemma 3.2 and 3.5, the algorithm finishes in 
d{Xr] + ^) = d{VlD) with high probability. □ 

3.1 Correctness (Proof of Lemma 3.1) 

In this section, we prove Lemma 3.1 which claims the correctness of the algorithm. Recall that the 
lemma is as follows. 

Lemma 3.1 (Restated). Algorithm Single- Random- Walk solves 1-RW-DoS. That is, for any 
node V, after algorithm Single- RANDOM- Walk finishes, the probability that v outputs the ID of s 
is equal to the probability that it is the destination of a random walk of length i starting at s. 

To prove this lemma, we first claim that Sample- Coupon returns a coupon where the node 
holding this coupon is a destination of a short walk of length uniformly random in [A,2A — 1]. 

Claim 3.7. Each short walk length (returned by Sample-CouponJ is uniformly sampled from the 
range [A, 2 A — 1]. 

Proof. Each walk can be created in two ways. 

• It is created in Phase 1. In this case, since we pick the length of each walk uniformly from 
the length [A, 2A — 1], the claim clearly holds. 

• It is created by Send-More-Coupon. In this case, the claim holds by the technique of 
reservoir sampling [54]: Observe that after the A*'^ step of the walk is completed, we stop 
extending each walk at any length between A and 2A — 1 uniformly. To see this, observe 
that we stop at length A with probability 1/A. If the walk does not stop, it will stop at 
length A -|- 1 with probability . This means that the walk will stop at length A -|- 1 with 
probability x = j. Similarly, it can be argue that the walk will stop at length i for 
any i S [A, 2A — 1] with probability j. 

□ 

Moreover, we claim that Sample-Coupon(w) samples a short walk uniformly at random among 
many coupons (and therefore, short walks starting at v). 
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Claim 3.8. Algorithm Sample-Coupon (tij (cf. Algorithm 1), for any node v, samples a coupon 
distributed by v uniformly at random. 

Proof. Assume that before this algorithm starts, there are t (without loss of generality, let t > 0) 
coupons containing ID of v stored in some nodes in the network. The goal is to show that Sample- 
Coupon brings one of these coupons to v with uniform probability. For any node u, let be the 
subtree rooted at u and let Su be the set of coupons in T„. (Therefore, = T and \Sv\ = t.) 

We claim that any node u returns a coupon to its parent with uniform probability (i.e., for any 
coupons X G Su, F[ii returns x] is l/l^ul (if \Su\ > 0)). We prove this by induction on the height of 
the tree. This claim clearly holds for the base case where u is a leaf node. Now, for any non-leaf 
node u, assume that the claim is true for any of its children. To be precise, suppose that u receives 
coupons and counts from q—1 children. Assume that it receives coupons di,d2, c?g-i and counts 
ci,C2, ...,Cg_i from nodes ui,U2, ...,Uq-i, respectively. (Also recall that do is the sample of its own 
coupons (if exists) and cq is the number of its own coupons.) By induction, dj is sent from uj to 
u with probability I/ISm I, for any < j < g — 1. Moreover, Cj = {Su] for any j. Therefore, any 

^ 1 C ■ 1 ^ 

coupon dj will be picked with probability ^ — |- x ^ — - = ^-j- as claimed. 

The lemma follows by applying the claim above to v. □ 

The above two claims imply the correctness of the Algorithm Single-Random-Walk as shown 
next. 

of Lemma 3.1. Any two [A, 2A — l]-length walks (possibly from different sources) are independent 
from each other. Moreover, a walk from a particular node is picked uniformly at random. Therefore, 
algorithm Single-Random-Walk is equivalent to having a source node perform a walk of length 
between A and 2A — 1 and then have the destination do another walk of length between A and 
2A — 1 and so on. That is, for any node v, the probability that v outputs the ID of s is equal to 
the probability that it is the destination of a random walk of length i starting at s. □ 

3.2 Analysis of Phase 1 (Proof of Lemma 3.2) 

In this section, we prove the performance of Phase 1 claimed in Lemma 3.2. Recall that the lemma 
is as follows. 

Lemma 3.2 (Restated). Phase 1 finishes in 0{\rj) rounds with high probability. 

We now prove the lemma. For each coupon C, any j = 1,2, A, and any edge e, we define 
X^{e) to be a random variable having value 1 if C is sent through e in the j*'* iteration (i.e., when 
the counter on C is increased from j — 1 to j). Let X^{e) = c ■ coupon -^ci^)- compute the 
expected number of coupons that go through an edge e, as follows. 

Claim 3.9. For any edge e and any j, K[X^ {e)] = 2rj. 

Proof. Recall that each node v starts with ?7deg(f) coupons and each coupon takes a random walk. 
We prove that after any given number of steps j, the expected number of coupons at node v is still 
?7deg(t;). Consider the random walk's probability transition matrix, call it A. In this case Au = u 
for the vector u having value where m is the number of edges in the graph (since this u is 

the stationary distribution of an undirected unweighted graph). Now the number of coupons we 
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started with at any node i is proportional to its stationary distribution, therefore, in expectation, 
the number of coupons at any node remains the same. 

To calculate E[X-'(e)], notice that edge e will receive coupons from its two end points, say x 
and y. The number of coupons it receives from node x in expectation is exactly the number of 
coupons at x divided by deg(x). The claim follows. □ 

By Chernoff's bound (e.g., in [47, Theorem 4.4.]), for any edge e and any j, 

F[X^{e) > 47?log?i] < 2-^'°sn ^ ^-4_ 

(We note that the number 4r/ log n above can be improved to cij log n/ log log n for some constant k. 
This improvement of log log n can be further improved as r] increases. This fact is useful in practice 
but does not help improve our claimed running time since we always hide a polylogn factor.) 

It follows that the probability that there exists an edge e and an integer 1 < j < A such that 
X^{e) > Arjlogn is at most \E{G)\Xn~^ < ^ since |£^(G)| < and X < £ < n (by the way we 
define A). 

Now suppose that (e) < 4ry log n for every edge e and every integer j < A. This implies that 
we can extend all walks of length i to length i + 1 in 4r/logn rounds. Therefore, we obtain walks 
of length A in 4A7ylogn rounds, with high probability, as claimed. 

3.3 Worst-case bound of Phase 2 (Proof of Lemma 3.3) 

In this section, we prove the worst-case performance of Phase 2 claimed in Lemma 3.3. Recall that 
the lemma is as follows. 

Lemma 3.3 (Restated). Phase 2 finishes in + |) rounds. 

We first analyze the running time of Send-More-Coupons and Sample-Coupon. 

Lemma 3.10. For any v, Send-More-CouponS(u, t], X) always finishes within 0(A) rounds. 

Proof. Consider any node u during the execution of the algorithm. If it contains x coupons of v 
(i.e., which just contain the ID of v), for some x, it has to pick x of its neighbors at random, and 
pass the coupon of v to each of these x neighbors. It might pass these coupons to less than x 
neighbors and cause congestion if the coupons are sent separately. However, it sends only the ID 
of V and a count to each neighbor, where the count represents the number of coupons it wishes to 
send to such neighbor. Note that there is only one ID sent during the process since only one node 
calls Send-More-Coupons at a time. Therefore, there is no congestion and thus the algorithm 
terminates in 0(A) rounds. □ 

Lemma 3.11. Sample-Coupon always finishes within 0{D) rounds. 

Proof. Since, constructing a BFS tree can be done easily in 0{D) rounds, it is left to bound the 
time of the second part where the algorithm wishes to sample one of many coupons (having its 
ID) spread across the graph. The sampling is done while retracing the BFS tree starting from leaf 
nodes, eventually reaching the root. The main observation is that when a node receives multiple 
samples from its children, it only sends one of them to its parent. Therefore, there is no congestion. 
The total number of rounds required is therefore the number of levels in the BFS tree, 0{D). □ 
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Now we prove the worst-case bound of Phase 2. Fhst, observe that Sample-Coupon is called 
0{j) times since it is called only by a connector (to find the next node to forward the token 
to). By Lemma 3.11, this algorithm takes O(^) rounds in total. Next, we claim that Send- 
MORE-COUPONS is called at most O(^) times in total (summing over all nodes). This is because 
when a node v calls SEND-MORE-CouPONS(t', r], A), all rj walks starting at v must have been 
stitched and therefore v contributes Xrj steps of walk to the long walk we are constructing. It 
follows from Lemma 3.10 that Send-More-Coupons algorithm takes O(^) rounds in total. The 
claimed worst-case bound follows by summing up the total running times of Sample-Coupon and 
Send-More-Coupons. 

3.4 A Probabilistic bound for Phase 2 (Proof of Lemma 3.5) 

In this section, we prove the high probability time bound of Phase 2 claimed in Lemma 3.5. Recall 
that the lemma is as follows. 

Lemma 3.5 (Restated). For any rj and A such that i]X > 32\/Z(log n)^, Phase 2 finishes in O(^) 
rounds with high probability. 

Recall that we may assume that £ = 0{m?) (cf. Observation 2.1). We prove the stronger bound 
using the following lemmas. As mentioned earlier, to bound the number of times Send-More- 
COUPONS is invoked, we need a technical result on random walks that bounds the number of times 
a node will be visited in a ^-length random walk. Consider a simple random walk on a connected 
undirected graph on n vertices. Let deg(a;) denote the degree of x, and let m denote the number 
of edges. Let Nf{y) denote the number of visits to vertex y by time t, given that the walk started 
at vertex x. 

Now, consider k walks, each of length starting from (not necessary distinct) nodes xi,X2, • • • , a^fc- 
We show a key technical lemma that applies to random walks on any (undirected) graph: With 
high probability, no vertex y is visited more than 32 deg(x)\/A5Z"+Tlog n -|- k times. 

Lemma 3.12 (Random Walk Visits Lemma). For any nodes xi, X2, • • • , Xk, and £ = 0{m?), 

k 

P(3y s.t. ^Np{y) > 32 deg{x)\/k£ +l\ogn + k) < 1/n. 
1=1 

Since the proof of this lemma is interesting on its own and lengthy, we defer it to Section 3.5. 
We note that one can also show a similar bound for a specific vertex, i.e. P(3y s.t. Yli=i ^i^iv) — 
32 deg(x) \/MTT log n + k). Since we will not use this bound here, we defer it to Lemma 3.18 in 
Subsection 3.5. Moreover, we prove the above lemma only for a specific number of visits of roughly 
y/k£ because this is the expected number of visits (we show this in Proposition 3.16 in Section 3.5). 
It might be possible to prove more general bounds; however, we do not include them here since 
they need more proofs and are not relevant to the results of this paper. 

Also note that Lemma 3.12 is not true if we do not restrict £ to be 0{m'^). For example, consider 
a star network and a walk of length £ such that £ ^ n'^ and £ is larger than the mixing time. In this 
case, this walk will visit the center of the star ^{£) times with high probability. This contradicts 
Lemma 3.12 which says that the center will be visited 0{n\f€) = o{€) times with high probability. 
We can modify the statement of Lemma 3.12 to hold for a general value of £ as follows (this fact is not 
needed in this paper): P(3y s.t. X^^^^ ^/'(y) — 32 deg (x)\Jk£^ 1 log n -|- /c -|- £deg(a;)/m) < 1/n. 
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(Recall that m is the number of edges in the network.) This inequality can be proved using 
Lemma 3.12 and the fact that is larger than the mixing time, which means that the walk will 
visit vertex x with probability deg(x)/m in each step after the {rn?Y^ step. 

Lemma 3.12 says that the number of visits to each node can be bounded. However, for each 
node, we are only interested in the case where it is used as a connector. The lemma below shows 
that the number of visits as a connector can be bounded as well; i.e., if any node Vi appears t times 
in the walk, then it is likely to appear roughly t/\ times as connectors. 

Lemma 3.13. For any vertex v, if v appears in the walk at most t times then it appears as a 
connector node at most t(logn)^/A times with probability at least 1 — l/n^. 

At first thought, the lemma above might sound correct even when we do not randomize the 
length of the short walks since the connectors are spread out in steps of length approximately 
A. However, there might be some periodicity that results in the same node being visited multiple 
times but exactly at A-intervals. (As we described earlier, one example is when the input network 
is a star graph and A = 2.) This is where we crucially use the fact that the algorithm uses walks 
of length uniformly random in [A, 2 A — 1]. The proof then goes via constructing another process 
equivalent to partitioning the ^ steps into intervals of A and then sampling points from each interval. 
We analyze this by constructing a different process that stochastically dominates the process of a 
node occurring as a connector at various steps in the ^-length walk and then use a Chernoff bound 
argument. 

In order to give a detailed proof of Lemma 3.13, we need the following two claims. 

Claim 3.14. Consider any sequence A of numbers ai,...,a^' of length ^ . For any integer A', 
let B be a sequence OA'+ru '22A'+ri+r2 5 ^^jA'+ri+...+rp ••• where ri, for any i, is a random inte- 
ger picked uniformly from [0,A' — 1]. Consider another subsequence of numbers C of A where 
an element in C is picked from "every A' numbers" in A; i.e., C consists of \_(-' /\'\ numbers 
ci,C2,... where, for any i, Ci is chosen uniformly at random from a^i^i)x'+i,ci(^i^i'jX'+2i ■■■TO.iy . Then, 
¥[C contains {oi^ , Oi^ , . . . , ai,^}] = F[B = {ai^,ai^, ...,ai^}] for any set {oj^ , Ojj, aj^^}. 

Proof. Observe that B will be equal to {ai^,aj2, only for a specific value of ri, r2, r^. 

Since each of n, r2, is chosen uniformly at random from [1, A'], F[B = {ai^ , a^j, Oj^.}] = A'~'^. 
Moreover, the C will contain Oj^ , ai2, aj^,} if and only if, for each j, we pick Oij from the interval 
that contains it (i.e., from a{j'_i)A'+ii 0(i'-i)A'+2) •••) Oi'A') for some i'). (Note that ai-^^^ai^, ... are all 
in different intervals because ij+i — ij > A' for all j.) Therefore, P[C contains ajj , a^j, Oij.}] = 
A'-^ □ 

Claim 3.15. Consider any sequence A of numbers ai,...,a£ of length I' . Consider subsequence 
of numbers C of A where an element in C is picked from from "every A' numbers" in A; i.e., 
C consists of yH.' /\'\ numbers ci,C2,... where, for any i, Ci is chosen uniformly at random from 
'^(«-i)A'+i5 ^(j-i)A'+2) •••) ^lA' •• For any number x, let Ux be the number of appearances of x in A; 
i.e., Ux = \{i I Oi = Then, for any R > dn^/X', x appears in C more than R times with 

probability at most 2^^. 

Proof. For i = 1,2, [i' let Xi be a 0/1 random variable that is 1 if and only if q = x and 
X = Xi. That is, X is the number of appearances of x in C. Clearly, E[X] = Ux/X'. 

Since Xj's are independent, we can apply the Chernoff bound (e.g., in [47, Theorem 4.4.]): For any 

R > 6E[X] = 6nx/X', 

F[X <R\> 2-^. 
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The claim is thus proved. □ 

of Lemma 3.13. Now we use the claims to prove the lemma. Choose i' = £ and A' = A and consider 
any node v that appears at most t times. The number of times it appears as a connector node is the 
number of times it appears in the subsequence B described in Claim 3.14. By applying Claim 3.14 
and 3.15 with R = t(log n)^, we have that v appears in B more than t(log n)^ times with probability 
at most 1/n? as desired. □ 

Now we are ready to prove the probabilistic bound of Phase 2 (cf. Lemma 3.5). 

First, we claim, using Lemma 3.12 and 3.13, that each node is used as a connector node at 
most ^^'^°g(^)^(*°s ") times with probability at least 1 — 2/n. To see this, observe that the claim 
holds if each node x is visited at most t{x) = 32deg(x)\/£ + 1 logn times and consequently appears 
as a connector node at most t(x)(log n)^/A times. By Lemma 3.12, the first condition holds with 
probability at least 1 — 1/n. By Lemma 3.13 and the union bound over all nodes, the second 
condition holds with probability at least 1 — 1/n, provided that the first condition holds. Therefore, 
both conditions hold together with probability at least 1 — 2/n as claimed. 

Now, observe that Sample-Coupon is invoked 0{j) times (only when we stitch the walks) 
and therefore, by Lemma 3.11, contributes O(^) rounds. Moreover, we claim that Send-More- 
COUPONS is never invoked, with probability at least 1 — 2/n. To see this, recall our claim above 
that each node x is used as a connector node at most 3^'^^g(^)^(^°g") times. Additionally, observe 
that we have prepared this many walks in Phase 1; i.e., after Phase 1, each node has r]deg(x) > 
32 deg(x)v7(iogn) gj^^j,^ walks. The claim follows. 

Therefore, with probability at least 1 — 2/n, the rounds are 0{^) as claimed. 

3.5 Proof of Random Walk Visits Lemma (cf. Lemma 3.12) 

In this section, we prove the Random Walk Visits Lemma introduced in the previous section. We 
restated it here for the sake of readability. 

Lemma 3.12 (Random Walk Visits Lemma, Restated). For any nodes xi,X2, ■ ■ ■ ,Xk, and i = 

k 

P(3?/ s.t. ^ N-^' (y) > 32 deg{x)\/ ki + 1 log n + /c) <l/n. 
1=1 

We start with the bound of the first moment of the number of visits at each node by each walk. 
Proposition 3.16. For any node x, node y and t = 0{'m?), 

E[iVf(y)] <8deg(y)^/tTT. (1) 

To prove the above proposition, let P denote the transition probability matrix of such a ran- 
dom walk and let tt denote the stationary distribution of the walk, which in this case is simply 
proportional to the degree of the vertex, and let TTmin = niin^ 'k{x). 

The basic bound we use is the following estimate from Lyons (see Lemma 3.4 and Remark 4 in 
[45]). Let Q denote the transition probability matrix of a chain with self- loop probablity a > 0, and 
with c = min{7r(x)(5(3;, y) : x ^ y and Q{x, y) > 0} . Note that for a random walk on an undirected 
graph, c = For A; > a positive integer (denoting time) , 
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' 1 < mm< 



' Tr{y) I- iacVk + T' 2a^c'^{k + l). 

For /c < /3m^ for a sufficiently small constant /3, and small a, the above can be simplified to the 
following bound (we use Observation 2.1 here); see Remark 3 in [45]. 

<J=M= = l|Mll. (3) 

c\/k + 1 \/k + 1 

Note that given a simple random walk on a graph G, and a corresponding matrix P, one can 
always switch to the lazy version Q = {I + P)/'^, and interpret it as a walk on graph G' , obtained 
by adding self-loops to vertices in G so as to double the degree of each vertex. In the following, 
with abuse of notation we assume our P is such a lazy version of the original one. 

of Proposition 3.16. Let Xq, Xi, . . . describe the random walk, with Xi denoting the position of the 
walk at time i > 0, and let 1a denote the indicator (0-1) random variable, which takes the value 1 
when the event A is true. In the following we also use the subscript x to denote the fact that the 
probability or expectation is with respect to starting the walk at vertex x. We get the expectation. 



* 1 

< 4deg(y)^ , (using the above inequality (3)) 

< 8deg{y)VtTT. 

□ 

Using the above proposition, we bound the number of visits of each walk at each node, as 
follows. 

Lemma 3.17. For t = 0{m?') and any vertex y ^ G, the random walk started at x satisfies: 

1 



(TVf (y) > 32 deg(y)VtTTlogn) < 



Proof. First, it follows from the Proposition and Markov's inequality that 

P(A^r(y) > 4 • 8 deg{y)Vt + l) <j. (4) 

For any r, let Lf.{y) be the time that the random walk (started at x) visits y for the r*^' time. 
Observe that, for any r, Nf{y) > r if and only if L^{y) < t. Therefore, 

P(iVr(y)>r)=P(L^(y)<t). (5) 

Let r* = 32 deg{y)^/tTT. By (4) and (5), P(L^*(y) < t) < I . We claim that 

P(^?.iogn(y)<i)< (4) (6) 
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To see this, divide the walk into logn independent subwalks, each visiting y exactly r* times. Since 
the event -^^^*iogn(y) — ^ implies that all subwalks have length at most t, (6) follows. Now, by 
applying (5) again, 

m^y) > r*logn) = P(L^*i„g„(y) < t) < ^ 

as desired. □ 

We now extend the above lemma to bound the number of visits of all the walks at each particular 
node. 

Lemma 3.18 (Random Walk Visits Lemma For a Specific Vertex). For 7 > 0, and t = 0{m?), 
and for any vertex y £ G, the random walk started at x satisfies: 



(^7Vf'(y)>32 deg{y)y/kt + 1 log n + /c) < — . 

i=l 



Proof. First, observe that, for any r, 

k 

P(^iVf(y)>r-fc) <P[<(y)>r]. (7) 

i=l 

To see this, we construct a walk W of length kt starting at y in the following way: For each i, denote 
a walk of length t starting at Xi by Wj. Let Tj and t- be the first and last time (not later than time 
t) that Wi visits y. Let Wl be the subwalk of Wi from time Tj to t-. We construct a walk W by 
stitching W{, VF^, VF^ together and complete the rest of the walk (to reach the length kt) by a 
normal random walk. It then follows that the number of visits to y by Wi, W2, ■ ■ ■ , (excluding 
the starting step) is at most the number of visits to y by W. The first quantity is Yli=i ^t' iv) ~ ^• 
(The term '— A;' comes from the fact that we do not count the first visit to y by each Wi which is 
the starting step of each Wl.) The second quantity is N^f {y). The observation thus follows. 
Therefore, 

^ ^ 1 

P(^iVf'(y) > 32 deg{y)Vkt + llogn + k) < P(A^|'i(2/) > 32 deg{y)Vkt + llogn) < — 

i=l ^ 

where the last inequality follows from Lemma 3.17. □ 

The Random Walk Visits Lemma (cf. Lemma 3.12) follows immediately from Lemma 3.18 by 
union bounding over all nodes. 

4 Variations, Extensions, and Generalizations 
4.1 Computing k Random Walks 

We now consider the scenario when we want to compute k walks of length i from different (not 
necessary distinct) sources si, S2, ■ ■ ■ , s^. We show that Single- Random- Walk can be extended 
to solve this problem. Consider the following algorithm. 
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Many-Random- Walks Let A = {32^/k£D + 1 logn -|- /c)(logn)^ and i] = 1. If X > £ then run 
the naive random walk algorithm, i.e., the sources find walks of length i simultaneously by sending 
tokens. Otherwise, do the following. First, modify Phase 2 of Single-Random- Walk to create 
multiple walks, one at a time; i.e., in the second phase, we stitch the short walks together to get a 
walk of length i starting at si then do the same thing for S2, S3, and so on. 

The correctness of Many- Random- Walks follows from Lemma 3.1; intuitively, this algorithm 
outputs independent random walks because it obtains long walks by stitching short walks that are 
all independent (no short walk is used twice). We now prove the running time of this algorithm. 

Theorem 4.1. Many- Random- Walks finishes in O (jnm{\/kiD + k,k + i)^ rounds with high 
probability. 

Proof. First, consider the case where A > ^. In this case, mm{\/ k£D + k , Vki+k+i) = 0{Vk£+k + 
i). By Lemma 3.12, each node x will be visited at most 0{deg{x){^/kl+k)) times. Therefore, using 
the same argument as Lemma 3.2, the congestion is 0{^/k£ + k) with high probability. Since the 
dilation is £, Many- Random-Walks takes 0{Vk£ + k + £) rounds as claimed. Since 2Vki < k + i, 
this bound reduces to 0{k + £). 

Now, consider the other case where X < i. In this case, min{V kiD + k, \fkl k ^ €) = 
d{Vk£D + k). Phase 1 takes d{Xr]) = d{VkiD + k). The stitching in Phase 2 takes d{k£D/X) = 
0{VkiD). Moreover, by Lemma 3.12, send-more-COUPONS will never be invoked. Therefore, the 
total number of rounds is 0{VkiD + k) as claimed. □ 

4.2 Regenerating the entire random walk 

Our algorithm can be extended to regenerate the entire walk, solving A;-RW-pos. This will be 
use, e.g., in generating a random spanning tree. The algorithm is the following. First, inform all 
intermediate connecting nodes of their position which can be done by keeping track of the walk 
length when we do token forwarding in Phase 2. Then, these nodes can regenerate their 0{^/I) 
length short walks by simply sending a message through each of the corresponding short walks. This 
can be completed in 0{-\/ ID) rounds with high probability. This is because, with high probability, 
Send-More-Coupons will not be invoked and hence all the short walks are generated in Phase 1. 
Sending a message through each of these short walks (in fact, sending a message through every 
short walk generated in Phase 1) takes time at most the time taken in Phase 1, i.e., 0{^/ ID) rounds. 

4.3 Generalization to the Metropolis-Hastings algorithm 

We now discuss extensions of our algorithm to perform a random walk according to the Metropolis- 
Hastings algorithm, a more general type of random walk with numerous applications (e.g., [56]). 
The Metropolis-Hastings [31, 46] algorithm gives a way to define a transition probability so that 
a random walk converges to any desired distribution vr (where VTj, for any node i, is the desired 
stationary probability at node i). It is assumed that every node i knows its steady state probability 
vTj (and can know its neighbors' steady state probabilities in one round). 

The Metropolis-Hastings algorithm is roughly as follows (see, e.g., [31, 46] for the full descrip- 
tion). For any desired distribution tt and any desired laziness factor < a < 1, the transition 
probability from node i to its neighbor j is defined to be 

Pij = amin(l/(ij,7rj/(7rj(ij)) 
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where di and dj are degree of i and j respectively. It can be shown that a random walk with this 
transition probability converges to vr. 

Using the transition probability defined above, we now run the Single- Random- Walk algo- 
rithm with one modification: in Phase 1, we generate 



TTlX 

T] 



tt(x) 

a miUx — 



deg{x) 

short walks instead of r/deg(u). 

The correctness of the algorithm follows from Lemma 3.1. The running time follows from the 
following theorem. 

Theorem 4.2. For any rj and A such that rjX > 32\/Z(log n)^, the modified Single- RANDOM- Walk 
algorithm stated above finishes in 

^ ^ max^7r(3;)/ deg(x) _^ £D 
mmyTr{y)/ deg{y) A 

rounds with high probability. 

An interesting application of the above theorem is when vr is a stationary distribution. In this 
case, we can compute a random walk of length £ in 0{Xrj-\-^) rounds which is exactly Theorem 3.6. 
Like Theorem 3.6, the above theorem follows from the following two lemmas which are similar to 
Lemmas 3.2 and 3.5. 

Lemma 4.3. For any vr and a, Phase 1 finishes in 0{Xr]logn • ^^^^^ '^(y)/dcg{y) ) with high 

probability. 

Proof. The proof is essentially the same as Lemma 3.2. We present it here for completeness. Let 
P = ^fm~- Consider the case when each node i creates /37r(i)ry messages. We show that the 

dcg(a:) 

lemma holds even in this case. 

We use the same definition as in Lemma 3.2. That is, for each message M, any j = 1, 2, A, 
and any edge e, we define X|,j(e) to be a random variable having value 1 if M is sent through e in 
the j'*'* iteration (i.e., when the counter on M has value j — 1). Let X^{e) = m -message -^m(^)- 
We compute the expected number of messages that go through an edge. As before, we show the 
following claim. 

Claim 4.4. For any edge e and any j, E[X^ (e)] = 2r] ■ '^n^^fa^ff^j - 

Proof. Assume that each node v starts with /37r{v)'q messages. Each message takes a random walk. 
We prove that after any given number of steps j, the expected number of messages at node v is 
still I3ii{v)rj. Consider the random walk's probability transition matrix, say A. In this case Au = u 
for the vector u having value 7r(u) (since this 'k{v) is the stationary distribution). Now the number 
of messages we started with at any node i is proportional to its stationary distribution, therefore, 
in expectation, the number of messages at any node remains the same. 

To calculate E[X-'(e)], notice that edge e will receive messages from its two end points, say 
X and y. The number of messages it receives from node x in expectation is exactly (3Tr{x)r] x 
a mini 4-, --^) < ri ■ /"(^)/<^'^g(^) _ 'pj^g claim follows. □ 

\dx^ TTxdy'' — I mmyiT{y)/ deg(y) 
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The high probabihty analysis follows the same way as the analysis of Lemma 3.2. □ 

Lemma 4.5. For any r/ and A such that r]X > 32\/£(log n)^, Phase 2 finishes in 0{j) rounds with 
high probability. 

Proof. (Sketched) We first prove a result similar to Proposition 3.16 
Claim 4.6. For any node x, node y and t = 0{nn?), 

m^jy)] < . ^ ■ (8) 

^ * ^^'^ - amin^7r(x)/deg(2;) ^ ' 

Proof. The proof is similar to the proof of Lemma 3.16 except that 

c = aminvrfx)/ deg(x). 

X 

It follows that 

t t 
E[iVf(y)] = E.[J^1|^,=,}] = 

i=0 i=0 

< ^^^^ , (using the above inequality (3)) 



i=0 

^ 87r{y)VtTT 
~ a minx T^ix) / deg(x) 



By following the rest of the proof of Lemma 3.12, we conclude the following. 
Claim 4.7. For any nodes xi,X2, ... ,Xk, and£ = 0{m'^), 

F{3y s.t. ViVf (y) >32 ■. "'/^w . , . Vkf+Tlogn + k) < 1/ 

^—^ a mm^ 7r(x)/ deg(2;j 



n . 



□ 



Following the proof of Lemma 3.5, we have that each node y is used as a connector at most 

32( — ■ "/^^H , J \^(log 
A 

times with probability at least 1 — 2/n. Additionally, observe that we have prepared this many 
walks in Phase 1; i.e., after Phase 1, each node x has 

it(x) 



^ _ 7r(x) ^ 32(_^lg^)^/£(logn)= 



tt(x) — \ 

a mm,; , V \ ^ 
short walks. The claim follows. □ 
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4.4 k Walks where Sources output Destinations (fc-RW-SoD) 

In this section we extend our results to /c-RW-SoD using the following lemma. 

Lemma 4.8. Given an algorithm that solves k-RW-DoS in 0{S) rounds, for any S, one can extend 
the algorithm to solve k-RW-SoD in 0{S + k + D) rounds. 

The idea of the above lemma is to construct a BFS tree and have each destination node send its 
ID to the corresponding source via the root. By using upcast and downcast algorithms [52], this 
can be done in 0{k + D) rounds. 

Proof. Let the algorithm that solves /c-RW-DoS perform one walk each from source nodes si, ■ ■ ■ , 
Let the destinations that output these sources be di,d2, ■ ■ ■ ,dk respectively. This means that for 
each 1 < i < k, node deg(x) has the ID of source Sj. To prove the lemma, we need a way for each 
deg(x) to communicate its own ID to Sj respectively, in 0{k + D) rounds. The simplest way to do 
this is for each node ID pair (deg(x), Si) to be communicated to some fixed node r, and then for r to 
communicate this information to the sources Sj. This is done by r constructing a BFS tree rooted 
at itself. This step takes 0{D) rounds. Now, each destination deg(x) sends its pair (deg(x),Sj) up 
this tree to the root r. This can be done in 0{D + k) rounds using an upcast algorithm [52]. Node 
r then uses the same BFS tree to route back the pairs to the appropriate sources. This again takes 
0{D + k) rounds using a downcast algorithm [52]. □ 

Applying Theorem 4.1 and Lemma 4.8, the following theorem follows. 

Theorem 4.9. Given a set of k sources, one can perform k-RW-SoD after random walks of length 
£ in d{Vk£D + D + k) rounds. 

5 Applications 

In this section, we present two applications of our algorithm. 

5.1 A Distributed Algorithm for Random Spanning Tree 

We now present an algorithm for generating a random spanning tree (RST) of an unweighted 
undirected network in 0{y/mD) rounds with high probability. The approach is to simulate Aldous 
and Broder's [2, 10] RST algorithm which is as follows. First, pick one arbitrary node as a root. 
Then, perform a random walk from the root node until all nodes are visited. For each non-root 
node, output the edge that is used for its first visit. (That is, for each non-root node v, if the first 
time V is visited is t then we output the edge (u, v) where u is the node visited at time t — 1.) The 
output edges clearly form a spanning tree and this spanning tree is shown to come from a uniform 
distribution among all spanning trees of the graph [2, 10]. The running time of this algorithm is 
bounded by the time to visit all the nodes of the the graph which can shown to be 0{mD) (in the 
worst case, i.e., for any undirected, unweighted graph) by Aleniunas et al. [3]. 

This algorithm can be simulated on the distributed network by our random walk algorithm 
as follows. The algorithm can be viewed in phases. Initially, we pick a root node arbitrarily and 
set i = n. In each phase, we run logn (different) walks of length i starting from the root node 
(this takes 0{V£D) rounds using our distributed random walk algorithm). If none of the O(logn) 
different walks cover all nodes (this can be easily checked in 0{D) time), we double the value of £ 
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and start a new phase, i.e., perform again logn walks of length L The algorithm continues until 
one walk of length I covers all nodes. We then use such walk to construct a random spanning tree: 
As the result of this walk, each node knows its position(s) in the walk (cf. Section 3), i.e., it has 
a list of steps in the walk that it is visited. Therefore, each non-root node can pick an edge that 
is used in its first visit by communicating to its neighbors. Thus at the end of the algorithm, each 
node can know which of its adjacent edges belong to the output tree. (An additional 0{n) rounds 
may be used to deliver the resulting tree to a particular node if needed.) 

We now analyze the number of rounds in term of r, the expected cover time of the input graph. 
The algorithm takes O(logT) phases before 2t < I < 4t, and since one of logn random walks of 
length 2r will cover the input graph with high probability, the algorithm will stop with I < At with 
high probability. Since each phase takes 0{-\/TD) rounds, the total number of rounds is 0{VtD) 
with high probability. Since r = 0{mD), we have the following theorem. 

Theorem 5.1. The algorithm described above generates a uniform random, spanning tree in 0{^/rnD) 
rounds with high probability. 

5.2 Decentralized Estimation of Mixing Time 

We now present an algorithm to estimate the mixing time of a graph from a specified source. 
Throughout this section, we assume that the graph is connected and non-bipartite (the conditions 
under which mixing time is well-defined). The main idea in estimating the mixing time is, given a 
source node, to run many random walks of length £ using the approach described in the previous 
section, and use these to estimate the distribution induced by the ^-length random walk. We then 
compare the distribution at length with the stationary distribution to determine if they are 
close, and if not, double £ and retry. For this approach, one issue that we need to address is how 
to compare two distributions with few samples efficiently (a well-studied problem). We introduce 
some definitions before formalizing our approach and theorem. 

Definition 5.2 (Distribution vector). Let irx{t) define the probability distribution vector reached 
after t steps when the initial distribution starts with probability 1 at node x. Let vr denote the 
stationary distribution vector. 

Definition 5.3 {t^{5) ((5-near mixing time), and r^j^. (mixing time) for source x). Define t^{6) = 
mint : ||7r^(t) - 7r||i < 6. Define r^^^. = T'-^{l/2e). 

The goal is to estimate r^i^- Notice that the definitions of t^{S) and r^j^, are consistent due 
to the following standard monotonicity property of distributions. 

Lemma 5.4. ||7r2;(t + 1) — 7r||i < ||7r2-(t) — vr||i. 

Proof. We need to show that the definition of mixing times are consistent, i.e. monotonic in t 
the number of steps of the random walk. This is folklore but for completeness, we show this 
via simple linear algebra and the definition of distributions. Let A denote the transpose of the 
transition probability matrix of the graph being considered. That is, A{i,j) denotes the probability 
of transitioning from node j to node i. Further, let x denote any probability vetor. Now notice 
that we have < this follows from the fact that the sum of entries of any column of A 

is 1 (since it is a Markov chain) , and the sum of entries of the vector x is 1 (since it is a probability 
distribution vector). 
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Now let vr be the stationary distribution of the graph corresponding to A. This imphes that if 
£ is (5-near mixing, then \\A^u — vrHi < 6, by the definition of 6-neav mixing time. Now consider 
ll^^+i^ — 7r||i. This is equal to H^^+^u — ^vrHi since At: = vr. However, this reduces to — 
'^)\\i ^ ^ (which again follows from the fact that A is stochastic). It follows that {£ + 1) is (5-near 
mixing. □ 

To compare two distributions, we use the technique of Batu et. al. [7] to determine if the 
distributions are 5-near. Their result (slightly restated) is summarized in the following theorem. 

Theorem 5.5 ([7]). For any e, given 0{n^^'^poly{e~^)) samples of a distribution X over [n], and a 

3 

specified distribution Y, there is a test that outputs PASS with high probability if\X—Y\i < j^/=j^^j 
and outputs FAIL with high probability if \X — Y\i > 6e. 

The distribution X in our context is some distribution on nodes and Y is the stationary dis- 
tribution, i.e., Y{v) = deg(f)/(2m) (recall that m is the number of edges in the network). In 
this case, the algorithm used in the above theorem can be simulated in a distributed network in 
0{D + 2/log(l -|- e)) rounds, as in the following theorem. 

Theorem 5.6. For any e, given 0{n^f'^poly{e~^)) samples of a distribution X over [n], and a 
stationary distribution Y , there is a 0{D + 2/log(l + e))-time test that outputs PASS with high 

3 

probability if \X — Y\i < j^/=j^^j o.'^d outputs FAIL with high probability if \X — Y\i > 6e. 

Proof. We now give a brief description of the algorithm of Batu et. al. [7] to illustrate that it 
can in fact be simulated on the distributed network efficiently. The algorithm partitions the set of 
nodes into k buckets, where k = (2/log(l -|- e))logn, based on Y (the stationary distribution in 
this case) . Denote these buckets by Ri, . . . , R^- Each bucket Ri consists of all nodes v such that 

< Y{v) < ^ ^^g-^ . Since n, m and e can be broadcasted to all nodes in 0{D) rounds and 
each node v can compute its stationary distribution Y{v) = deg{v) / {2m) , each node can determine 
which bucket it is in in 0{D) rounds. 

Now, we sample 0{n^^'^poly{e~^)) nodes based on distribution X. Each of the 0{n^^'^poly{e~^)) 
sampled nodes from X falls in one of these buckets. We let ii be the number of sampled nodes 
in bucket Ri and let Y{Ri) be the distribution of Y on Ri. The values of ii and Y{Ri), for all i, 
can compute and sent to some central node in 0{k) = 0(2/log(l + e)) rounds. Finally, the central 
node uses this information to determine the output of the algorithm. We refer the reader to [7] for 
a precise description. □ 

Our algorithm starts with i = 1 and runs K = 0(y^polylog(e~^)) walks (for choice of e = 
l/12e) of length i from the specified source x. As the test of comparison with the stationary 
distribution outputs FAIL, i is doubled. This process is repeated to identify the largest i such that 
the test outputs FAIL with high probability and the smallest i such that the test outputs PASS 
with high probability. These give lower and upper bounds on the required r^^^ respectively. Our 
resulting theorem is presented below. 

Theorem 5.7. Given a graph with diameter D, a node x can find, in 6(nV2 ^ „i/4^^^x(j)) 
rounds, a time f^^^ such that r^-^ < f^^^ < r^'C^), where 6 = ^g^^eV^^ogn - 
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Proof. For undirected unweighted graphs, the stationary distribution of the random walk is known 
and is for node i with degree deg{i), where m is the number of edges in the graph. If a 

source node in the network knows the degree distribution, we only need 0{v}^'^poly{e~^)) samples 
from a distribution to compare it to the stationary distribution. This can be achieved by running 
MultipleRandomWalk to obtain K = d{n^/'^poly{e~^)) random walks. We choose e = l/12e. 
To find the approximate mixing time, we try out increasing values of £ that are powers of 2. Once 
we find the right consecutive powers of 2, the monotonicity property admits a binary search to 
determine the exact value for the specified e. 

The result in [7] can also be adapted to compare with the stationary distribution even if the 
source does not know the entire distribution. As described previously, the source only needs to 
know the count of number of nodes with stationary distribution in given buckets. Specifically, the 
buckets of interest are at most 0{n^/'^poly{e~^)) as the count is required only for buckets were a 
sample is drawn from. Since each node knows its own stationary probability (determined just by 
its degree), the source can broadcast a specific bucket information and recover, in 0{D) steps, the 
count of number of nodes that fall into this bucket. Using upcast, the source can obtain the bucket 
count for each of these at most 0{n^^'^poly{e~^)) buckets in 0{n^^'^poly{e~^) + D) rounds. 

By Theorem 4.1, a source node can obtain K samples from K independent random walks of 
length i in 0{K + \/ KID) rounds. Setting K = 0{n^^'^poly{e~^) + D) completes the proof. □ 

Suppose our estimate of T^j^^ is close to the mixing time of the graph defined as Tynix — 
maXjjT^^^, then this would allow us to estimate several related quantities. Given a mixing time 
Tmixi we can approximate the spectral gap (1 — A2) and the conductance ($) due to the known 
relations that jtzaJ < Tmix < ^^^d 0(1 — A2) < $ < 0(\/l — A2) as shown in [33]. 

6 Concluding Remarks 

This paper gives a tight upper bound on the time complexity of distributed computation of random 
walks in undirected networks. Thus the running time of our algorithm is optimal (within a poly- 
logarithmic factor), matching the lower bound that was shown recently [50]. However, our upper 
bound for performing k independent random walks may not be tight and it will be interesting to 
resolve this. 

While the focus of this paper is on time complexity, message complexity is also important. 
In particular, our message complexity for computing k independent random walks of length £ is 
0{'m\/£D + ny^£/D) which can be worse than the naive algorithm's 0{k(.) message complexity. 
It would be important to come up with an algorithm that is round efficient and yet has smaller 
message complexity. In a subsequent paper [18], we have addressed this issue partly and shown 
that, under certain assumptions, we can extend our algorithms to be message efficient also. 

We presented two algorithmic applications of our distributed random walk algorithm: estimating 
mixing times and computing random spanning trees. It would be interesting to improve upon these 
results. For example, is there a 0{y^T^~ + n^/^) round algorithm to estimate r^'; and is there an 
algorithm for estimating the mixing time (which is the worst among all starting points)? Another 
open question is whether there exists a 0(n) round (or a faster) algorithm for RST? 

There are several interesting directions to take this work further. Can these techniques be 
useful for estimating the second eigenvector of the transition matrix (useful for sparse cuts)? Are 
there efficient distributed algorithms for random walks in directed graphs (useful for PageRank 
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and related quantities)? Finally, from a practical standpoint, it is important to develop algorithms 
that are robust to failures and it would be nice to extend our techniques to handle such node/edge 
failures. This can be useful for doing decentralized computation in large-scale dynamic networks. 
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